Control Organoid samples from YFP

samples

AD4,AD8,AD12,AD16 are control samples

/home/pjb40/jupytervenv/lib/python3.7/site-packages/anndata/_core/anndata.py:21: FutureWarning: pandas.core.index is deprecated and will be removed in a future version.  The public classes are available in the top-level namespace.
  from pandas.core.index import RangeIndex
scanpy==1.5.1 anndata==0.7.1 umap==0.3.10 numpy==1.16.5 scipy==1.4.1 pandas==1.0.1 scikit-learn==0.23.1 statsmodels==0.10.1 python-igraph==0.7.1 louvain==0.6.1
'/n/scratch3/groups/hsph/hbc/pjb40/scratch/TimeSeries_10X/data/velocyto_analysis/Only_controlN_Tumor/control_redo2'

Load matrix

  • with adata and bdata from velocity, also filtered

Estimate the RNA velocity

Filtered out 248 genes that are detected 20 counts (shared).
WARNING: Did not normalize X as it looks processed already. To enforce normalization, set `enforce=True`.
Normalized count data: spliced, unspliced.
Skip filtering by dispersion since number of variables are less than `n_top_genes`
WARNING: Did not modify X as it looks preprocessed already.
computing neighbors
    finished (0:00:13) --> added 
    'distances' and 'connectivities', weighted adjacency matrices (adata.obsp)
computing moments based on connectivities
    finished (0:00:01) --> added 
    'Ms' and 'Mu', moments of spliced/unspliced abundances (adata.layers)
computing velocities
    finished (0:00:05) --> added 
    'velocity', velocity vectors for each individual cell (adata.layers)
computing velocity graph
    finished (0:00:47) --> added 
    'velocity_graph', sparse matrix with cosine correlations (adata.uns)
computing velocity embedding
    finished (0:00:02) --> added
    'velocity_umap', embedded velocity vectors (adata.obsm)

Subcluster the cells

AnnData object with n_obs × n_vars = 9199 × 1281 
    obs: 'DAY', 'batch', 'sample', 'n_counts', 'log_counts', 'n_genes', 'percent_mito', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'Clusters', '_X', '_Y', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'sample_batch', 'louvain_r0.01', 'louvain_r0.025', 'louvain_r0.05', 'louvain_r0.1', 'louvain_r0.2', 'louvain_r0.3', 'louvain_r0.4', 'louvain_r0.5', 'velocity_self_transition'
    var: 'gene_ids', 'feature_types', 'genome', 'n_cells', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'Accession', 'Chromosome', 'End', 'Start', 'Strand', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'velocity_gamma', 'velocity_r2', 'velocity_genes'
    uns: 'DAY_colors', 'diffmap_evals', 'draw_graph', 'louvain', 'louvain_r0.01_colors', 'louvain_r0.025_colors', 'louvain_r0.05_colors', 'louvain_r0.1_colors', 'louvain_r0.2_colors', 'louvain_r0.3_colors', 'louvain_r0.4_colors', 'louvain_r0.5_colors', 'neighbors', 'pca', 'rank_genes_groups', 'rank_genes_r0.2', 'sample_colors', 'umap', 'velocity_params', 'velocity_graph', 'velocity_graph_neg'
    obsm: 'X_diffmap', 'X_draw_graph_fa', 'X_pca', 'X_umap', 'velocity_umap'
    varm: 'PCs'
    layers: 'ambiguous', 'counts', 'matrix', 'spliced', 'unspliced', 'Ms', 'Mu', 'velocity', 'variance_velocity'
    obsp: 'connectivities', 'distances'
computing velocities
    finished (0:00:04) --> added 
    'velocity', velocity vectors for each individual cell (adata.layers)

Phate portrait for AT2 genes

plot shows the Hopx spiced/unspliced ration in clusters and by velocity
print the top 5 velocity genes 
ranking velocity genes
    finished (0:00:02) --> added 
    'rank_velocity_genes', sorted scores by group ids (adata.uns) 
    'spearmans_score', spearmans correlation scores (adata.var)
0 1 2 3 4 5
0 Tinag Igf1r Map1b Rnf150 Slc4a5 Gm15987
1 Atp6v0a4 Ptpn14 Gsta3 Sh3rf3 St3gal5 Swap70
2 Nckap5 Osbpl3 Ndnf Pde8b Osbpl6 Ctsl
3 Tmem164 Samd4 Cep128 Slc24a3 Arl5c Cd44
4 Aox3 Tspan5 Khdrbs3 Cdkl5 Rbms3 Ptpn14
first five genes in cluster 1
0     Igf1r
1    Ptpn14
2    Osbpl3
3     Samd4
4    Tspan5
Name: 1, dtype: object
first five genes in cluster 0
0       Tinag
1    Atp6v0a4
2      Nckap5
3     Tmem164
4        Aox3
Name: 0, dtype: object
plot the phate portrait for these genes 
check in cluster 2 and cluster 3 top velocity genes 
AnnData object with n_obs × n_vars = 9199 × 1281 
    obs: 'DAY', 'batch', 'sample', 'n_counts', 'log_counts', 'n_genes', 'percent_mito', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'Clusters', '_X', '_Y', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'sample_batch', 'louvain_r0.01', 'louvain_r0.025', 'louvain_r0.05', 'louvain_r0.1', 'louvain_r0.2', 'louvain_r0.3', 'louvain_r0.4', 'louvain_r0.5', 'velocity_self_transition'
    var: 'gene_ids', 'feature_types', 'genome', 'n_cells', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'Accession', 'Chromosome', 'End', 'Start', 'Strand', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'velocity_gamma', 'velocity_r2', 'velocity_genes', 'spearmans_score', 'velocity_score'
    uns: 'DAY_colors', 'diffmap_evals', 'draw_graph', 'louvain', 'louvain_r0.01_colors', 'louvain_r0.025_colors', 'louvain_r0.05_colors', 'louvain_r0.1_colors', 'louvain_r0.2_colors', 'louvain_r0.3_colors', 'louvain_r0.4_colors', 'louvain_r0.5_colors', 'neighbors', 'pca', 'rank_genes_groups', 'rank_genes_r0.2', 'sample_colors', 'umap', 'velocity_params', 'velocity_graph', 'velocity_graph_neg', 'rank_velocity_genes'
    obsm: 'X_diffmap', 'X_draw_graph_fa', 'X_pca', 'X_umap', 'velocity_umap'
    varm: 'PCs'
    layers: 'ambiguous', 'counts', 'matrix', 'spliced', 'unspliced', 'Ms', 'Mu', 'velocity', 'variance_velocity'
    obsp: 'connectivities', 'distances'

Velocity confidence score

--> added 'velocity_length' (adata.obs)
--> added 'velocity_confidence' (adata.obs)
--> added 'velocity_confidence_transition' (adata.obs)
louvain_r0.2 0 1 2 3 4 5
velocity_length 14.942689 14.455003 12.336332 18.692337 13.600061 11.741586
velocity_confidence 0.934476 0.888265 0.875414 0.890040 0.920993 0.913493
this graph tells us how the cells are connected to each other

Calculate the psuedotime

computing terminal states
    identified 1 region of root cells and 2 regions of end points 
    finished (0:00:01) --> added
    'root_cells', root cells of Markov diffusion process (adata.obs)
    'end_points', end points of Markov diffusion process (adata.obs)

This graph shows the most dymic cells are starting at starting at the tip of the cluster which is cluster no. 3 here and as it goes down, the cells in most developing stage are in cluster 1.

computing terminal states
    identified 1 region of root cells and 2 regions of end points 
    finished (0:00:01) --> added
    'root_cells', root cells of Markov diffusion process (adata.obs)
    'end_points', end points of Markov diffusion process (adata.obs)

Calculate the velocity markers again

These is confirming if the velocity genes have changed

ranking velocity genes
    finished (0:00:05) --> added 
    'rank_velocity_genes', sorted scores by group ids (adata.uns) 
    'spearmans_score', spearmans correlation scores (adata.var)
0 1 2 3 4 5
0 Tinag Igf1r Map1b Rnf150 Slc4a5 Gm15987
1 Atp6v0a4 Ptpn14 Gsta3 Sh3rf3 St3gal5 Swap70
2 Nckap5 Osbpl3 Ndnf Pde8b Osbpl6 Ctsl
3 Tmem164 Samd4 Cep128 Slc24a3 Arl5c Cd44
4 Aox3 Tspan5 Khdrbs3 Cdkl5 Rbms3 Ptpn14

Psuedotime calculation

running PAGA
    finished (0:00:06) --> added
    'paga/transitions_confidence', connectivities adjacency (adata.uns)
    'paga/connectivities', connectivities adjacency (adata.uns)
    'paga/connectivities_tree', connectivities subtree (adata.uns)
0 1 2 3 4 5
0 0 0 0 0 0 0
1 0 0 0 0.19 0 0
2 0.12 0 0 0.091 0 0
3 0 0 0 0 0 0
4 0 0 0 0.059 0 0
5 0 0.014 0 0 0 0
WARNING: Invalid color key. Using grey instead.

STOP HERE AND DO NOT RUN BELOW. REFERE TO SUBCLUSTER PART 1 NOTEBOOK.